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Кегпе! Support Vector Machine 


Roadmap 


@ Embedding Numerous Features: Kernel Models 









Lecture 2: Dual Support Vector Machine 


dual SVM: another QP with valuable geometric 
messages and almost no dependence on d 





Lecture 3: Kernel Support Vector Machine 


ə Kernel Trick 

e Polynomial Kernel 

e Gaussian Kernel 

e Comparison of Kernels 





Combining Predictive Features: Aggregation Models 
Q Distilling Implicit Features: Extraction Models 
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Кегпе! Support Vector Machine Kernel Trick 


Dual SVM Revisited 


goal: SVM without dependence on d | 


half-way done: 
min Заоа -1Та 
a 
subjectto — y/o = 0; 
Qn > 0,for n— 1,2,..., М 


° Qnm = УпУт2 12 т: inner product in Rd 
° need: 212 = (x4)! (Xm) calculated faster than O(d) 





can we do so? | 


Кегпе! Support Vector Machine Kernel Trick 


Fast Inner Product for b; 
2nd order polynomial transform 


2 2 2 
Фа(Х) = (1, X1, X2,- - - , Xd, Хр. мою, - , X1Xd, 30X1, X, os | KONG X3) 


—include both ху хо & xəxi for ‘simplicity’ :-) 






d d d 
ba(x) bx) = 1-У хх + у у xxxix 
= 


i=1 /=1 


d d d 
/ / "m 
= = j=1 


= 14xlx + (x!x')(xTx') 


for 65, transform + inner product can be 
carefully done in O(d) instead of O(a?) | 


Кегпе! Support Vector Machine Kernel Trick 


Kernel: Transform + Inner Product 


transform Ф <> kernel function: Ко(х,х/) = b(x)! b(x') 
Фә» — Ko, (х,х') = 1 + (xTx') + (x! x’)? 





* optimal bias b? from SV (Xs, ys), 


= 
N N 
b=ys— м "2s SE = ( > авиан Zs — ys — > «луп (К(х„,х.„)) 
n=1 


n=1 


e optimal hypothesis дум: for test input x, 


N 
gsw(X) = sign (w' (x) +b) = sign (X OnyanK (Xn, X) + | 


ПЕ 


Кегпе! trick: plug in efficient Кегпе! function 
to avoid dependence on d | 





Кегпе! Support Vector Machine Kernel Trick 


Kernel SVM with QP 
Hard-Margin M Algorithm 
Ө qnm = YnymK(Xn, Xm); p = —1 y; (A, с) for equ./bound constraints 
Ө a+ ОР(Оь,р, A, с) 

Өр (v - > anynk(xo x) ) with SV (xs, ys) 


SV indices n 
Ө return SVs and their an as well as b such that for new x, 


gsvw (X) = sign ( > atnYnK (Xn, Xx) + b) 


SV indices n 








° (1): time complexity O(N?) - (kernel evaluation) 
E (2): QP with N variables and N + 1 constraints 
° (3) & (4): time complexity O(;4SV) - (kernel evaluation) 





kernel SVM: 1 
use computational shortcut to avoid d 8 predict with SV only | 


Кегпе! Support Vector Machine Kernel Trick 


Fun Time 


Consider two examples x and x’ such that х/х’ = 10. What is 
Ko, (X, X')? 

@ | 

Ө 11 


Ө 111 
Ө 1111 





Кегпе! Support Vector Machine Kernel Trick 


Fun Time 


Consider two examples x and x’ such that х/х’ = 10. What is 
Ko, (x, X')? 

© | 

Ө 11 

Ө 111 

Ө 1111 


Reference Answer: 9 


Using the derivation in previous slides, 
Ko, (X, X') 21 + x'x' + (x!x')?. 
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Кегпе! Support Vector Machine Polynomial Kernel 


General Poly-2 Kernel 


Ф>(Х) = (1,36,...,X9,32,..., x5) © Ke,(x,x') = 1 +x!x' + (x! x')2 
ф(х) = (1, V2xi,..., V2xq XD. .... XS) & К(х,х')=1+2х'х' + (хх)? 


Фо) = Пу. жи 
€ Ko(x,x’) = 1 + 29x! x' + 7?(xTx’)? 





Ko(x,x’) = (1 + yx! x'? with y > 0 
e K>: somewhat ‘easier’ to calculate than Ко, 


о Ф> and Ф: equivalent power, 
different inner product — different geometry 





Kə commonly used | 


Кегпе! Support Vector Machine Polynomial Kernel 


Poly-2 Kernels in Action 





© 
о о 
о 00999 
a х 
х 5 х х 
х 
х 
bX ж “ x 5 
х 





























(1 + 0.001x!x')2 











9 Jsvm different, | SVs | different 
—‘hard’ to say which is better before learning 


e change of kernel = change of margin definition 

















need selecting K, just like selecting | 
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Кегпе! Support Vector Machine Polynomial Kernel 


General Polynomial Kernel 


Ko(xX,X) = (C+ yx!x')2 with y > 0,¿ > 0 
Ka(x,X) = (¿+ x!x') with y > 0,¿ > 0 
Ко(х,х') = (¿+ xlx')9 with y > 0,¿ > 0 








e embeds Фо specially with parameters 
О) 

e allows computing large-margin polynomial 

classification without dependence on d 
















SVM + Polynomial Kernel: Polynomial SVM |» хў x 
x 


10-th order polynomial 
with margin O.1 
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Кегпе! Support Vector Machine Polynomial Kernel 


Special Case: Linear Kernel 


I (x) хх, 


Ko(x,x') = (¿+ x'x’)® with y > 0,¿ > 0 





e Кү: just usual inner product, called 
linear kernel 

e ‘even easier’: can be solved (often in 

primal form) efficiently 





linear first, remember? :-) | 
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Кегпе! Support Vector Machine Polynomial Kernel 


Fun Time 


Consider the general 2-nd polynomial kernel K2(x, x’) = ( + yx! x')*. 
Which of the following transform can be used to derive this kernel? 


© (x) = (1, /29ж,..., У2уха, x5, «5 7х) 

эхо |.  X5) 

Ө Ф(х) = (С, V270%1,..., V270xa, х2,..., x8) 
(1222 20 207) 








Кегпе! Support Vector Machine Polynomial Kernel 


Fun Time 


Consider the general 2-nd polynomial kernel K2(x, x’) = ( + yx! x')*. 


Which of the following transform can be used to derive this kernel? 


© (x) = (1, /29ж,..., V2YXa, x5, . .. тх) 
эхо 2хх с 

Ө Ф(х) = (С, V270%1,..., /2лСХа,Х2,..., x8) 
Ө 9(x) = (6, V 2yG, ..., VAC Xa, 3X2, XH) 








Reference Answer: (4) 


We need to have С? from the 0-th order terms, 
20x! х! from the 1-st order terms, and 
ож) from the 2-nd order terms. 
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Кегпе! Support Vector Machine Gaussian Kernel 


Kernel of Infinite Dimensional Transform 
infinite dimensional Ф(х)? Yes, if K(x, x’) efficiently computable! 





when x = (х), K(xx) = 


with infinite dimensional Ф(х) = exp(—x?) - (1, ique Wc ) 


exp(—(x = x’)?) 
exp( — (x)°)exp( — (х)Р)ехр(2хх') 


өхр(—(х)®)ехр(—(Х/)/) = em 











0 
(оон S (хуку 
1-0 э 
o) ol) 


more generally, Gaussian kernel 
K(x, x') = exp (—y||x — x'||?) with y > 0 


Кегпе! Support Vector Machine Gaussian Kernel 


Hypothesis of Gaussian SVM 


Gaussian kernel K(x, x’) = exp (—7||x — x'||2) 


9зум(Х) 


sign z олулК(Хл,Х) + ) 


SV 


= sign (= AnYn€xp [l = хл?) d | 


SV 


e linear combination of Gaussians centered at SVs x, 
e also called Radial Basis Function (АВР) kernel 





Gaussian SVM: 
find a, to combine Gaussians centered at X» 
& achieve large margin in infinite-dim. space 






Кегпе! Support Vector Machine Gaussian Kernel 


Support Vector Mechanism 


large-margin 

hyperplanes 

+ higher-order transforms with kernel trick 
# not many 

boundary sophisticated 



















e transformed vector z = Ф(х) => efficient kernel K(x, x’) 
e store optimal w — store а few SVs and ол 


new possibility by Gaussian SVM: 
infinite-dimensional linear classification, with 
generalization ‘guarded by’ large-margin :-) 






Кегпе! Support Vector Machine Gaussian Kernel 


Gaussian SVM in Action 
































о о о 
о 
° ° 
o 9 909 co ° 
со Go OQ 
cf? og ® ° 
оо © 
x x о © 
ü= >) Ss S3 
exp(—1||x — x'|2) ехр(-10|х-3х 2) ^ exp(-100|x — x'|2) 
e large y => sharp Gaussians => оуегїїг? 
e warning: SVM can still overfit :-( | 





Gaussian SVM: need careful selection of ^ | 
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Кегпе! Support Vector Machine Gaussian Kernel 


Fun Time 


Consider the Gaussian kernel K(x, x’) = exp(—^||x — х/||2). What 
function does the kernel converge to if y — oc? 


© Kim(x, x’) = 0 
Ө Kim(x, x 2) [x — он x'] 
© Kim(x, X) = [x Z X] 
© Kim(x, X) = 1 





Кегпе! Support Vector Machine Gaussian Kernel 


Fun Time 


Consider the Gaussian kernel K(x, x’) = ехр(-э Х-3Х 2). What 


function does the kernel converge to if y — со? 


© Kiim(x, x’) = 0 
Ө Къ (х, X) = [х=х] 
@ Kiim(X, x’) = [х = Х| 
o Кі (х, x’) = 1 





Reference Answer: (2) 


If x = x’, К(х, X) = 1 regardless of y. If x Z x', 
K(x, X) = 0 when y — со. Thus, Kim is an 
impulse function, which is an extreme case of 
how the Gaussian gets sharper when y — со. 
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Кегпе! Support Vector Machine Comparison of Kernels 


Linear Kernel: Cons and Pros 


/ 


К(х, x) = x'x 









e restricted 
—not always separable?! 


e safe—linear first, 
remember? :-) 

e fast—with special QP 
solver in primal 


e very explainable—w and 
SVs say something 

















linear kernel: an important basic tool 
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Кегпе! Support Vector Machine Comparison of Kernels 


Polynomial Kernel: Cons and Pros 


K(x,x') = (с 9x7’)? 












e less restricted than linear 


e numerical difficulty for 
large Q 
e б\+ух!х'|<1:К—0 
° |+ ухх | > 1: К > big 
e three parameters (y, С, Q) 
—more difficult to select 


polynomial kernel: perhaps small-Q only 
—sometimes efficiently done by linear on Фо(х) 
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e strong physical control 
—‘knows’ degree О 








Кегпе! Support Vector Machine Comparison of Kernels 


Gaussian Kernel: Cons and Pros 





K(x,x') = exp(—7||x—x'||?) 


e more powerful than 
linear/poly. 

e bounded—Iess numerical 
difficulty than poly. 

e one parameter 
only—easier to select 
than poly. 


Gaussian kernel: one of most popular but shall be used with care | 
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e mysterious—no м 
e slower than linear 
e too powerful?! 






Кегпе! Support Vector Machine Comparison of Kernels 


Other Valid Kernels 


e kernel represents special similarity: &(x)’(x’) 
e any similarity — valid kernel? not really 


e necessary & sufficient conditions for valid kernel: 
Mercer's condition 


• symmetric 
• let k; = K(x;, xj), the matrix К 
(xi)! 6(xi) (x; )"Ф(хэ) 2-2 $(x1)! (xy) 
2 Ф(Хо)7Ф(х:) (x2) ^(x2) 21: (Xo)! (xy) 
Ф(хн)7Ф(х!) (xy) (x) ... (xw) (xu) 
= [ 21 Zo 221 14 74 боб zw | 


ZZ! must always be positive semi-definite 


define your own kernel: possible, but hard | 


Machine Learning Techniques 
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Кегпе! Support Vector Machine Comparison of Kernels a 
Fun Time 
Which of the following is not a valid kernel? (Hint: Consider two 
1-dimensional vectors xi = (1) and x» = (—1) and check Mercers 
condition.) 








Кегпе! Support Vector Machine Comparison of Kernels 


Fun Time 


Which of the following is not a valid kernel? (Hint: Consider two 
1-dimensional vectors Хү = (1) алах, = (—1) and check Mercer's 





condition.) 
© K(x, x) = (—1 + хх)? 
Ө K(x,x’) = (0 + x'x'? 
Ө K(x,x’) = (1 + x'x'? 
Ө K(x,x’) = (-1- x'xy? 








Reference Answer: (1) 


The kernels in (2) апа (3) are just polynomial 
kernels. The kernel in (4) is equivalent to the 


kernel in (3). Рог e the matrix K formed 
from the kernel and the two examples is not 
positive semi-definite. Thus, the underlying 
kernel is not a valid one. 
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Summary 
@ Embedding Numerous Features: Kernel Models 


Lecture 3: Kernel Support Vector Machine 


ə Kernel Trick 
kernel as shortcut of transform + inner product 
» Polynomial Кегпе! 
embeds specially-scaled polynomial transform 
ə Gaussian Kernel 
embeds infinite dimensional transform 
ə Comparison of Kernels 
linear for efficiency or Gaussian for power 





e next: avoiding overfitting in Gaussian (and other kernels) 
Combining Predictive Features: Aggregation Models 
Ө Distilling Implicit Features: Extraction Models 


